Add Engram model structure integration (v1) by ilml · Pull Request #3689 · NVIDIA/Megatron-LM

ilml · 2026-03-04T05:12:08Z

Summary

Integrate DeepSeek's Engram n-gram hash embedding module into Megatron-LM as a new model type under megatron/core/models/engram/
Extends GPTModel with Engram-augmented transformer layers that inject gated n-gram embeddings before self-attention at configurable layer positions
Includes builder, layer specs, and pretrain_engram.py entry point following existing Mcore patterns (similar to Mamba integration)

Components

File	Purpose
`megatron/core/models/engram/engram_module.py`	Core: CompressedTokenizer, NgramHashMapping, MultiHeadEmbedding, ShortConv, EngramModule
`megatron/core/models/engram/engram_layer.py`	EngramTransformerLayer — extends TransformerLayer with pre-attention Engram
`megatron/core/models/engram/engram_model.py`	EngramGPTModel — extends GPTModel with hash pre-computation
`megatron/core/models/engram/engram_layer_specs.py`	Layer spec factory for Engram layers
`engram_builders.py`	Model builder
`pretrain_engram.py`	Training entry point with Engram CLI args

Design decisions

Extends GPTModel (not LanguageModule) since Engram is fundamentally GPT + extra module in specific layers
Two-phase forward: hash IDs pre-computed at model level (CPU/numpy), embeddings cached in each EngramModule, consumed during layer forward — avoids modifying TransformerBlock's interface
HC_MULT (hyper-connection) handled internally within EngramModule via expand/collapse — compatible with standard [S, B, H] tensor flow
No core Megatron files modified — pure inheritance + ModuleSpec system

Known limitations (v1)

HC dimension is local per Engram layer (not persisted across layers)
Engram embedding tables are not tensor-parallel-sharded
Hash computation runs on CPU (numpy)
Replaced sympy dependency with pure-Python primality test

Test plan

Verify model construction with --engram-layer-ids argument
Single-GPU forward pass sanity check
Compare Engram module output shapes against reference implementation

Made with Cursor

Integrate DeepSeek's Engram n-gram hash embedding module into Megatron-LM. This initial version focuses on model structure only, extending GPTModel with Engram-augmented transformer layers that inject gated n-gram embeddings before self-attention at configurable layer positions. Key components: - EngramModule: n-gram hashing, multi-head embedding, gated value projection, causal short convolution with hyper-connection multiplier - EngramTransformerLayer: extends TransformerLayer with pre-attention Engram - EngramGPTModel: extends GPTModel with hash pre-computation from input_ids - Layer specs, builder, and pretrain entry point following Mcore patterns Made-with: Cursor

copy-pr-bot · 2026-03-04T05:12:11Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

add unit test

db7bd21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Engram model structure integration (v1)#3689

Add Engram model structure integration (v1)#3689
ilml wants to merge 2 commits intoNVIDIA:devfrom
ilml:tolong/engram

ilml commented Mar 4, 2026

Uh oh!

copy-pr-bot bot commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ilml commented Mar 4, 2026

Summary

Components

Design decisions

Known limitations (v1)

Test plan

Uh oh!

copy-pr-bot bot commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant